A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
نویسندگان
چکیده
This article describes a standard API for set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent operations small matrices that are grouped together and processed by single routine, called routine. in uniformly sized groups, with just one group if all the equal size. aim to provide more efficient, but portable, implementations algorithms high-performance many-core platforms. These include multicore CPU processors, GPUs coprocessors, other hardware accelerators floating-point compute facility. As well as types double precision, we also half quadruple precision standard. In particular, used very large scale applications, such those associated machine learning.
منابع مشابه
An Extended Set of Fortran Basic Linear Algebra Subprograms
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers. An Extended Set of Fortran Basic Linear Algebra Subprograms Jack J. Dongarra † Mathematics and Computer Science Division Argonne National Laboratory ...
متن کاملHeteroPBLAS: A Set of Parallel Basic Linear Algebra Subprograms Optimized for Heterogeneous Computational Clusters
This paper presents a software library, called Heterogeneous PBLAS (HeteroPBLAS), which provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters. This library is written on the top of HeteroMPI and PBLAS whose building blocks, the de facto standard kernels for matrix and vector operations (BLAS) and message passing communication (BLACS), are optimize...
متن کاملA Proposal for a Set of Parallel Basic Linear Algebra Subprograms
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms(PBLAS). The PBLAS are targeted at distributed vector-vector, matrix-vector and matrix-matrix operations with the aim of simplifying the parallelization of linear algebra codes, espe-cially when implemented on top of the sequential BLAS.At rst glance, because of the apparent simplicity of its s...
متن کاملTowards Reversible Basic Linear Algebra Subprograms: A Performance Study
Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Mathematical Software
سال: 2021
ISSN: ['0098-3500', '1557-7295']
DOI: https://doi.org/10.1145/3431921